Our primary dataset is the GTZAN Dataset, which includes data spanning ten genres, with a hundred 30-second audio files for each. For the purposes of my analysis, I utilized the segmented dataset, where each original audio file has been partitioned into 3-second clips. This segmentation allows for a more granular examination of the musical features embedded within the dataset. This data was collected in 2000-2001 using a variety of sources such as CDs, radio, and microphone recordings, to account for potential variations in recording methods.
Lasso regression retained all features with non-zero coefficients, indicating their significance in predicting musical genre. Despite this, I chose to further explore multicollinearity to enhance interpretability and gain deeper insights into the relationships within the data.
During exploratory data analysis, I found significant correlations among variables. The plot below illustrates the correlation coefficients between the first 25 numeric variables in the dataset (excluding constants).
The line plot highlights a strong relationship (r = 0.89) observed between the mean spectral bandwidth and spectral centroid across genres. Through interactive filtering, you can refine the view by clicking (and double-clicking) genre labels on the plot’s right side. I decided to filter results by genre for a more focused exploratory comparative analysis on genre-specific patterns.
These observations hint at potential redundancy or multicollinearity within the dataset. To address this issue, I explored dimensionality reduction techniques aimed at enhancing robustness and reducing noise.
When aiming to capture 90% of the variance, there is a modest reduction in dimensions, decreasing from 57 to 31.
The lack of clear separation observed in the PCA plot suggests that the genres of music are not easily distinguishable within the reduced-dimensional space defined by the principal components. This indicates potential complexity within the data or significant overlap between classes in terms of their features. Given this observation, which implies that the data does not lend itself well to PCA, I will not utilize the reduced dataset to aid in classification.
The plot below illustrates the distribution of test accuracies across musical genres obtained using Linear Discriminant Analysis (LDA) from fifty train-test (80-20 split) iterations. The red dashed line marks the overall accuracy of LDA, which was 67.22%.
This dataframe comprises all predictions collected across the fifty train-test iterations. It contains a total of 99,900 observations.
| predicted_label | true_label |
|---|---|
| rock | blues |
| blues | blues |
| reggae | blues |
| blues | blues |
| reggae | blues |
| blues | blues |
The plot below illustrates the distribution of test accuracies across musical genres obtained using Quadratic Discriminant Analysis (QDA) from fifty train-test (80-20 split) iterations. The red line displays the overall accuracy, which was 76.81%.
Following experimentation involving diverse layer structures and various activation functions (e.g., sigmoid instead of ReLU), it became evident that this particular architecture yielded the most favorable predictive outcomes:
nn_mod <- keras_model_sequential() %>%
layer_dense(units = 128, activation = "relu", input_shape = c(57)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
I conducted a grid search across the parameter space defined by the batch sizes (8, 16, 32, 64) and epochs (10, 15, 20, 25) to determine the optimal configuration that would minimize validation loss. A batch size of 16 paired with 15 epochs were found to be the most effective parameters.
The following plots display the results obtained during the training of the neural network, using the optimal hyperparameters. Ultimately, the trained model demonstrated a test accuracy ranging from 84% to 87%.
After standardizing the features through centering and scaling, I optimized the k parameter for the K-nearest neighbors (KNN) algorithm. Utilizing 5-fold cross-validation, I evaluated multiple k values to pinpoint the one that minimizes classification error.
After partitioning the dataset into training and testing subsets with an 80-20 ratio, I performed the K-nearest neighbors (KNN) algorithm utilizing the optimal k value of 1. This KNN model has an accuracy of 90.69%.
## Registered S3 methods overwritten by 'proxy':
## method from
## print.registry_field registry
## print.registry_entry registry
| blues | classical | country | disco | hiphop | jazz | metal | pop | reggae | rock | |
|---|---|---|---|---|---|---|---|---|---|---|
| blues | 178 | 1 | 4 | 3 | 0 | 5 | 1 | 0 | 2 | 3 |
| classical | 0 | 171 | 3 | 2 | 1 | 10 | 0 | 0 | 0 | 0 |
| country | 5 | 4 | 168 | 2 | 2 | 2 | 0 | 7 | 3 | 5 |
| disco | 4 | 1 | 0 | 191 | 1 | 0 | 0 | 7 | 3 | 7 |
| hiphop | 0 | 0 | 0 | 0 | 186 | 0 | 0 | 3 | 0 | 3 |
| jazz | 3 | 6 | 4 | 1 | 0 | 173 | 0 | 1 | 0 | 2 |
| metal | 1 | 0 | 1 | 3 | 1 | 0 | 200 | 0 | 0 | 1 |
| pop | 0 | 0 | 0 | 2 | 1 | 1 | 0 | 164 | 2 | 4 |
| reggae | 9 | 0 | 5 | 2 | 4 | 1 | 1 | 3 | 197 | 5 |
| rock | 3 | 2 | 11 | 3 | 0 | 2 | 3 | 3 | 1 | 184 |
| Value | |
|---|---|
| Accuracy | 0.9069 |
| Kappa | 0.8965 |
| AccuracyLower | 0.8933 |
| AccuracyUpper | 0.9193 |
| AccuracyNull | 0.1071 |
| AccuracyPValue | 0.0000 |
| McnemarPValue | NaN |
I conducted Leave-One-Out Cross-Validation (LOOCV) employing the KNN algorithm with the optimal k value of 1 on the dataset. The resulting accuracy was 92.22%. Notably, LOOCV offers a more thorough validation approach compared to traditional 80-20 splits, as it evaluates the model’s performance on every single data point, providing a comprehensive assessment of its generalization capability.
| blues | classical | country | disco | hiphop | jazz | metal | pop | reggae | rock | |
|---|---|---|---|---|---|---|---|---|---|---|
| blues | 945 | 4 | 28 | 1 | 1 | 15 | 5 | 0 | 6 | 8 |
| classical | 1 | 933 | 7 | 6 | 2 | 46 | 1 | 0 | 1 | 5 |
| country | 15 | 8 | 841 | 15 | 5 | 17 | 0 | 17 | 12 | 26 |
| disco | 3 | 1 | 17 | 940 | 9 | 4 | 2 | 35 | 6 | 38 |
| hiphop | 4 | 1 | 8 | 5 | 946 | 2 | 3 | 17 | 5 | 5 |
| jazz | 8 | 41 | 20 | 4 | 1 | 902 | 1 | 5 | 3 | 8 |
| metal | 0 | 1 | 2 | 2 | 4 | 0 | 968 | 1 | 0 | 14 |
| pop | 1 | 1 | 11 | 6 | 12 | 3 | 0 | 904 | 6 | 7 |
| reggae | 14 | 0 | 36 | 4 | 16 | 4 | 1 | 14 | 956 | 9 |
| rock | 9 | 8 | 27 | 16 | 2 | 7 | 19 | 7 | 5 | 878 |
| Value | |
|---|---|
| Accuracy | 0.9222 |
| Kappa | 0.9136 |
| AccuracyLower | 0.9168 |
| AccuracyUpper | 0.9274 |
| AccuracyNull | 0.1001 |
| AccuracyPValue | 0.0000 |
| McnemarPValue | 0.0001 |
After conducting classification experiments on musical genres using LDA, QDA, Neural Networks, and KNN, I discover that KNN with k=1 yields the highest accuracy rate of 92.22%. While musical genres may be somewhat subjective, this model excels at recognizing the intricate patterns embedded within the audio features associated with various genres. This emphasizes the effectiveness of KNN in genre classification tasks, offering valuable insights into the discriminative power of audio features across diverse musical genres.